Model Selection

INT4 Quantization

# INT4 Quantization

Mistral Small 3.1 24B Instruct 2503 Quantized.w4a16

This is an INT4-quantized Mistral-Small-3.1-24B-Instruct-2503 model, optimized and released by Red Hat (Neural Magic), suitable for fast-response dialogue agents and low-latency inference scenarios.

Safetensors Supports Multiple Languages

Gemma 3 4b It GPTQ 4b 128g

INT4 quantized version based on the gemma-3-4b-it model, significantly reducing storage and computational resource requirements

Whisper Large V3.w4a16

This is the quantized version of openai/whisper-large-v3, employing INT4 weight quantization and FP16 activation quantization, suitable for vLLM inference.

Speech Recognition

Transformers English

Svdq Int4 Flux.1 Depth Dev

INT4 quantized version of FLUX.1-Depth-dev, capable of generating images from text descriptions while adhering to the structure of the input image. Compared to the original BF16 model, this version saves approximately 4x memory and improves runtime speed by 2-3x.

Image Generation English

FLUX.1 Dev Qint4

FLUX.1-dev is a text-to-image generation model quantized to INT4 format using Optimum Quanto, suitable for non-commercial use.

Text-to-Image English

Meta Llama 3.1 8B Instruct Quantized.w4a16

A quantized version of Meta-Llama-3.1-8B-Instruct, optimized to reduce disk space and GPU memory requirements, suitable for chat assistant scenarios in English business and research.

Large Language Model

Transformers Supports Multiple Languages

Meta Llama 3.1 70B Instruct AWQ INT4

INT4 quantized version of Llama 3.1 70B Instruct, optimized with AutoAWQ technology, suitable for multilingual dialogue scenarios.

Large Language Model

Transformers Supports Multiple Languages

Meta Llama 3.1 8B Instruct AWQ INT4

INT4 quantized version of Llama 3.1 8B Instruct, quantized using AutoAWQ tool, suitable for multilingual dialogue scenarios.

Large Language Model

Transformers Supports Multiple Languages

Whisper Large Onnx Int4 Inc

Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. This repository provides the Whisper large model in ONNX format with INT4 weight quantization, powered by Intel® Neural Compressor and Intel® Transformers Extension.

Speech Recognition

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase